The Approach I used to begin this Project is by searching through the web to takle major issues facing residence of Toronto, especially families which having working parents. As I research through this problem, The Canadian daycare market has a well-established surplus of demand, resulting in anxiety-inducing waitlists — joined as early as the day a couple learns they’re expecting — and monthly fees that can amount to more than a mortgage payment. “It’s worse than finding a house or looking for an apartment.
In the study, child care is divided into three categories – infant, toddler,preschooler, Kindergarten, and Gradelevel. The researchers define these categories as birth to two years for infants, 18 months to three years for toddlers, and two-and-a-half years to kindergarten age for preschoolers, which is age four or five, depending on the province.
These categories often see wildly different prices for full-time care, as a result of the smaller number of facilities available for infants, as well as the higher ratio of caregivers to children required by law. This means that while preschool age child care tends to be a lower figure overall, it gives the best idea of what the average family pays, since half of children in child care centres in Canada fall into this category.
But costs can be only half the battle for families when it comes to child care. The demand for child care greatly surpasses availability in most cases.
Researchers found that in almost three-quarters of the cities covered in the study, centres maintained lengthy waitlists of children waiting to enter care programs. Bigger cities see 80 to 90 per cent of centres maintaining a waitlist, while even smaller cities like St. John’s are in the 79 per cent range.
Wait lists are one of the major hurdles keeping parents from accessing quality child care, with families having to start the process early if they want to get their child enrolled. “You have to apply basically while you’re pregnant !!!
Source: https://www.ctvnews.ca/features/analysis-daycare-fees-continue-to-rise-across-canada-1.3940099
Help Individual (Parents) to find best Daycare Center for their child that match thier criteria.
Encourage Business owners to invest in such high demand business by leveraging them with the best location.
Benefits of growth of this business can be highly apprecaited by individuals and business owners. some are listed below:
A. by relieving parents from lengthy wait list.
B. cost reduction.
C. Good investment for Business owners as the subject is in High demand.
%matplotlib inline
import numpy as np # library to handle data in a vectorized manner
from bs4 import BeautifulSoup
import geocoder # import geocoder
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
import random
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn import preprocessing
from sklearn.datasets.samples_generator import make_blobs
import folium # map rendering library
print('Libraries imported.')
#reading table from wikipedia page
Toronto_df = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M", header=0,
attrs={"class":"wikitable sortable"})[0]
Toronto_df.head()
#lets drop Borough that has cells with 'Not assigned' values
Not_assigned = Toronto_df[Toronto_df['Borough'] == 'Not assigned'].index
# Delete these row indexes from dataFrame
Toronto_df.drop(Not_assigned , inplace=True)
Toronto_df.head()
#lets change cells that are having 'Not assigned' values in Neighborhood coloumn to match its corresponding Borough index
Toronto_df.loc[Toronto_df['Neighbourhood'] == 'Not assigned', ['Neighbourhood']] = Toronto_df['Borough']
Toronto_df.head()
#lets group Neighborhoods having the same postalcode
Toronto_df = Toronto_df.groupby(['Postcode','Borough'])['Neighbourhood'].apply(', '.join).reset_index()
Toronto_df.head()
Toronto_df.rename_axis("Postal Code", axis='index', inplace=True)
Toronto_df.head()
# lets now import all Licensed child care centers in Toronto
child_center = pd.read_html('https://github.com/mkorogluNYC/NYC_Data_ScienceAcademy/blob/master/Shiny_project/toronto_cc.csv#L1008',header=0)[0]
child_center.head()
# lets see how many child care centers in Toronto
child_center.shape
#lets examine total number of childern enrolled on those centers
child_center['Total'].sum()
child_center.drop(['Unnamed: 0', 'Unnamed: 1'], axis=1, inplace=True)
child_center.rename(columns={'district':'Borough'}, inplace=True)
child_center.head(5)
Child care cost is one of the key component of expenses of a family. It becomes more crucial for families, where each parent is not available to take care of their child and cannot afford to pay the full amount of the child care tuition. Child care fee subsidy in Canada is the main child care financing of each province introduced in the Canada Assistance Plan in 1972. Eligible families pay the child care cost depending on the income level. Certain conditions in terms of unavailability of each parent are listed in City of Toronto's website as the following.
Each parent is either:
Eligible income range for child care fee subsidy in the City of Toronto is up to $73,000. Families with lower income ranges are eligible for larger amount of fee subsidy. This project aims to visualize the effectiveness of child care fee subsidy using dataset from the City of Toronto open data catalogue. The project has three purposes. First, the user can find the child care centre depending on his or her preferences on the location, subsidy choice, and the type of child care centre. Second, child care centres can be listed through the data clustering map, where the user can filter the results based on the location, subsidy, and type of child care centre choice. Finally, key insights from the dataset are provided to make the decision on how effective the fee subsidy program in the City of Toronto is at policy level.
subsidy_center = child_center['subsidy'].value_counts()
subsidy_center = child_center.groupby(['name', 'subsidy','type','Borough'], as_index = False).sum()
subsidy_center.head(5)
#lets see how many centers provide fee_subsidy in Toronto
subsidy_center = child_center['subsidy'].value_counts()
subsidy_center.rename(columns={'subsidy':'fee_subsidy'},inplace = True)
subsidy_center.index.name = 'fee_subsidy'
subsidy_center
child_center.describe()
#lets see how many categories of child care centers and how many center in each category
center_type = child_center['type'].value_counts()
center_type.rename(columns={'type':'center_type'},inplace = True)
center_type.index.name = 'center_type'
center_type
import matplotlib.pyplot as plt
ax = center_type.plot(kind='bar',figsize=(15, 8),width = 0.4,color = ['#5cb85c','#5bc0de','#d9534f'],edgecolor=None)
plt.title("No. of Child Care Centers in each Category in Toronto",fontsize= 16)
plt.xticks(fontsize=14)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
plt.yticks([])
# Add this loop to add the annotations
for p in ax.patches:
width, height = p.get_width(), p.get_height()
x, y = p.get_xy()
ax.annotate("{:}".format(height), (p.get_x()+ .45*width, p.get_y() + height + 4) ,fontsize= 14)
#lets see how many child care centers in each Borough
center_loc = child_center['Borough'].value_counts()
center_loc.index.name = 'Borough'
center_loc
ax1 = center_loc.plot(kind='bar',figsize=(15, 8),width = 0.4 ,edgecolor=None)
plt.title("No. of Child Care Centers in Each Borough",fontsize= 16)
plt.xticks(fontsize=14)
ax1.spines['top'].set_visible(False)
ax1.spines['right'].set_visible(False)
ax1.spines['left'].set_visible(False)
plt.yticks([])
# Add this loop to add the annotations
for p in ax1.patches:
width, height = p.get_width(), p.get_height()
x, y = p.get_xy()
ax1.annotate("{:}".format(height), (p.get_x()+ .45*width, p.get_y() + height + 4) ,fontsize= 14)
# lets see child distribution among each categories in each Bourough
df_test = child_center[['type','Borough', 'subsidy','Infant','Toddler','Preschooler','Kindergarten','Gradelevel','Total']]
df_grp = df_test.groupby(['Borough','subsidy', 'type'], as_index = False).sum()
df_grp.head()
df_grp1 = df_grp.groupby(['type','subsidy','Borough'])['Total'].sum(normalize=True)
df_grp1
df_grp1.plot(kind='barh', figsize=(25, 20), color='steelblue', edgecolor=None, fontsize= 20)
plt.xlabel('Number of enrolled children',fontsize= 16)
plt.title('Child Distribution Among Each Categories in each Bourough',fontsize= 20)
for index, value in enumerate(df_grp1):
label = format(int(value), ',')
plt.annotate(label, xy=(value - 4, index - 0.10), color='black',fontsize= 20)
plt.show()
df_test1 = child_center[['type','Borough', 'subsidy','Infant','Toddler','Preschooler','Kindergarten','Gradelevel','Total']]
#df_test
df_grp2 = df_test.groupby(['Borough', 'type'], as_index = False).sum()
df_grp2
df_grp2.columns.values
df_grp2.index.values
print(type(df_grp2.columns))
print(type(df_grp2.index))
df_grp2.columns.tolist()
df_grp2.index.tolist()
print (type(df_grp2.columns.tolist()))
print (type(df_grp2.index.tolist()))
df_grp2.type # returns a series
df_grp2.set_index('type', inplace=True)
df_grp2
# city operated centers
city_op = df_grp2.loc['City-Operated', ['Borough','Infant', 'Toddler', 'Preschooler', 'Kindergarten', 'Gradelevel', 'Total']]
city_op.set_index('Borough', inplace=True)
city_op
# lets see what is the total number of children in city_operated centers in Toronto
child_num = city_op['Total'].sum()
child_num
ax4 = city_op.plot(kind='bar',figsize=(20, 10),width = 0.8,edgecolor=None)
plt.legend(city_op.columns,fontsize= 14)
plt.title("Distribution of children by Age enrolled in City_Operated centers in each Borough ",fontsize= 16)
plt.xticks(fontsize=14)
ax4.spines['top'].set_visible(False)
ax4.spines['right'].set_visible(False)
ax4.spines['left'].set_visible(False)
plt.yticks([])
# Add this loop to add the annotations
for p in ax4.patches:
width, height = p.get_width(), p.get_height()
x, y = p.get_xy()
ax4.annotate("{:}".format(height), (p.get_x()+.057 *width, p.get_y() + height + 2) ,fontsize= 13)
Commer_cent = df_grp2.loc['Commercial', ['Borough','Infant', 'Toddler', 'Preschooler', 'Kindergarten', 'Gradelevel', 'Total']]
Commer_cent.set_index('Borough', inplace=True)
Commer_cent
# lets see what is the total number of children in commercial centers in Toronto
child_num1 = Commer_cent['Total'].sum()
child_num1
ax5 = Commer_cent.plot(kind='bar',figsize=(20, 10),width = 0.8,edgecolor=None)
plt.legend(Commer_cent.columns,fontsize= 14)
plt.title("Distribution of children by Age enrolled in Commercial centers in each Borough ",fontsize= 16)
plt.xticks(fontsize=14)
ax5.spines['top'].set_visible(False)
ax5.spines['right'].set_visible(False)
ax5.spines['left'].set_visible(False)
plt.yticks([])
# Add this loop to add the annotations
for p in ax5.patches:
width, height = p.get_width(), p.get_height()
x, y = p.get_xy()
ax5.annotate("{:}".format(height), (p.get_x()+.059 *width, p.get_y() + height + 3) ,fontsize= 14)
non_profit = df_grp2.loc['Non-Profit', ['Borough','Infant', 'Toddler', 'Preschooler', 'Kindergarten', 'Gradelevel', 'Total']]
non_profit.set_index('Borough', inplace=True)
non_profit
# lets see what is the total number of children in Non-Profit centers in Toronto
child_num2 = non_profit['Total'].sum()
child_num2
ax6 = non_profit.plot(kind='bar',figsize=(20, 10),width = 0.8,edgecolor=None)
plt.legend(non_profit.columns,fontsize= 14)
plt.title("Distribution of children by Age enrolled in Non_Profit Centers in each Borough ",fontsize= 16)
plt.xticks(fontsize=14)
ax6.spines['top'].set_visible(False)
ax6.spines['right'].set_visible(False)
ax6.spines['left'].set_visible(False)
plt.yticks([])
# Add this loop to add the annotations
for p in ax6.patches:
width, height = p.get_width(), p.get_height()
x, y = p.get_xy()
ax6.annotate("{:}".format(height), (p.get_x()+.006 *width, p.get_y() + height + 7) ,fontsize= 14)
print('Percentage of child enrolled in Non-profit centers = {:}%'.format(child_num2/68763* 100))
print('Percentage of child enrolled in Commercial centers = {:}%'.format(child_num1/68763* 100))
print('Percentage of child enrolled in City-Operated centers = {:}%'.format(child_num/68763* 100))
child_center.columns = list(map(str, child_center.columns))
child_center.head()
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
#!conda install -c conda-forge folium=0.5.0 --yes
import folium
print('Folium installed and imported!')
# Use geopy library to get the latitude and longitude values of Toronto
address = 'Toronto, Ontario Canada'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto Canada are {}, {}.'.format(latitude, longitude))
# create map of Toronto using latitude and longitude values
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, name, type, Borough in zip(child_center['latitude'], child_center['longitude'], child_center['name'],
child_center['type'] , child_center['Borough'] ):
label = '{},{},{}'.format(name,type,Borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=4,
popup=label,
color='blue',
fill=True,
fill_color='#87cefa',
fill_opacity=0.5,
parse_html=False).add_to(toronto_map)
toronto_map
# Credentials to be Omitted!!!
# lets explore first child care center in our dataframe
child_center.loc[0]
# Get the child care center's latitude and longitude values.
childcare_center_latitude = child_center.loc[0, 'latitude'] # Center latitude value
childcare_center_longitude = child_center.loc[0, 'longitude'] # Center longitude value
childcare_center_name = child_center.loc[0, 'name'] # Center name
childcare_center_loc = child_center.loc[0, 'Borough']# Borough name
childcare_center_type = child_center.loc[0, 'type']# Center type
print('Latitude and longitude values of {} are {}, {}.'.format(childcare_center_name,
childcare_center_latitude,
childcare_center_longitude))
print('This child care center is in {} and it is {}.'.format(childcare_center_loc,childcare_center_type))
toronto_onehot = pd.get_dummies(child_center[['subsidy','type']], prefix="", prefix_sep="")
# add care center column back to dataframe
toronto_onehot['name'] = child_center['name']
# move care center column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()
toronto_onehot.shape
kclusters = 5
toronto_grouped_clustering = toronto_onehot.drop('name', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(toronto_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:5]
toronto_merged = child_center
# add clustering labels
toronto_merged['Cluster Labels'] = kmeans.labels_
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(toronto_onehot.set_index('name'), on='name')
toronto_merged.head() # check the last columns!
toronto_merged.groupby('Cluster Labels').mean()
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['latitude'], toronto_merged['longitude'], toronto_merged['name'],kmeans.labels_):
label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
folium.CircleMarker(
[lat, lon],
radius=5,
popup=label,
color=rainbow[cluster-1],
fill=True,
fill_color=rainbow[cluster-1],
fill_opacity=0.7).add_to(map_clusters)
map_clusters
# Lets just take portion of dataframe where Boroughs is having least number of care centers (197) that is Scarborough.
toronto_scarborough = toronto_merged[toronto_merged['Borough'].str.contains("Scarborough")].reset_index(drop=True)
print(toronto_scarborough.shape)
toronto_scarborough.head()
# Re-create the map with new markers for scarborough Neighborhoods.
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_scarborough['latitude'], toronto_scarborough['longitude'], toronto_scarborough['name'],kmeans.labels_):
label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
folium.CircleMarker(
[lat, lon],
radius=5,
popup=label,
color=rainbow[cluster-1],
fill=True,
fill_color=rainbow[cluster-1],
fill_opacity=0.7).add_to(map_clusters)
map_clusters
# The following function retrieves the care center given the names and coordinates and stores it into dataframe.
LIMIT=100
def getNearbyCenters(names, latitudes, longitudes, radius=500):
venues_list=[]
for name, lat, lng in zip(names, latitudes, longitudes):
print(name)
# create the API request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lng,
radius,
LIMIT)
# make the GET request
results = requests.get(url).json()["response"]['groups'][0]['items']
# return only relevant information for each nearby venue
venues_list.append([(
name,
lat,
lng,
v['venue']['name'],
v['venue']['location']['lat'],
v['venue']['location']['lng'],
v['venue']['categories'][0]['name']) for v in results])
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['care_center',
'care_center Latitude',
'care_center Longitude',
'Venue',
'Venue Latitude',
'Venue Longitude',
'Venue Category']
return(nearby_venues)
toronto_child_care = getNearbyCenters(names=toronto_scarborough['name'],
latitudes=toronto_scarborough['latitude'],
longitudes=toronto_scarborough['longitude']
)
# lets Check size of resulting dataframe
print(toronto_child_care.shape)
toronto_child_care.head()
#Count of venues were returned for each Borough
toronto_child_care.groupby('care_center').count().head()
#lets see how many venues were returned by Foursquare?
print('There are {} uniques categories.'.format(len(toronto_child_care['Venue Category'].unique())))
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_child_care[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
toronto_onehot['care_center'] = toronto_child_care['care_center']
# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()
toronto_onehot.shape
toronto_grouped = toronto_onehot.groupby('care_center').mean().reset_index()
toronto_grouped.head()
# Check new size:
toronto_grouped.shape
# Let's print each care_center along with the top 5 most common venues
num_top_venues = 5
for neigh in toronto_grouped['care_center']:
print("----"+neigh+"----")
temp = toronto_grouped[toronto_grouped['care_center'] == neigh].T.reset_index()
temp.columns = ['venue','freq']
temp = temp.iloc[1:]
temp['freq'] = temp['freq'].astype(float)
temp = temp.round({'freq': 2})
print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
print('\n')
address = '5600 Sheppard Ave E, Scarborough, Canada'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)
search_query = 'child care center'
radius = 2500
print(search_query + ' .... OK!')
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url
results = requests.get(url).json()
results
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]
# function that extracts the category of the venue
def get_category_type(row):
try:
categories_list = row['categories']
except:
categories_list = row['venue.categories']
if len(categories_list) == 0:
return None
else:
return categories_list[0]['name']
# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)
# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
dataframe_filtered
dataframe_filtered.name
center_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around your location
# add a red circle marker to represent your location
folium.features.CircleMarker(
[latitude, longitude],
radius=10,
color='red',
popup= address,
fill = True,
fill_color = 'red',
fill_opacity = 0.6
).add_to(center_map)
# add child care centers as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
folium.features.CircleMarker(
[lat, lng],
radius=5,
color='blue',
popup=label,
fill = True,
fill_color='blue',
fill_opacity=0.6
).add_to(center_map)
# display map
center_map
venue_id = '5841b68aaf5c144c4edaffa9' # ID of Harry's Italian Pizza Bar
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
url
result = requests.get(url).json()
print(result['response']['venue'].keys())
result['response']['venue']
try:
print(result['response']['venue']['rating'])
except:
print('This venue has not been rated yet.')
This section provides some insights on how the effectiveness of the fee subsidy program is related to the number of child care centres across districts and centre types. First, I aim to visualize the aspects of child care centres for the City of Toronto. I use bar charts to visualize the data.
From the bar chart, we see that the dataset can be briefly analyzed as the following:
clustering was applied to the data set based on the above categories. 5 clusters labels were in the range [0,1,2,3,4] as follows:
Investigating the effectiveness of fee subsidy program is multidimensional. It is related to the data on the details of fee subsidy applications in the City of Toronto as well as the fee subsidy policy of child care centres and how many spaces are reserved for children with fee subsidy in these centres. Furthermore, Census 2016 can be used to uncover some demographic features in each district and their wards. This may help us to see which wards need more child care centres with fee subsidy contract.